Introduction

The purpose of this supplementary file is to reproduce data analysis of artefact compositions and isotopic signals of objects from the Duchcov hoard from the article Exceptional ritual or a production deposit? The Duchcov hoard (Bohemia) as a proxy to ´Celtic migrations´ in the 4th century BC.

R version 3.6.3 was used (R Core Team, 2020) to run the analysis. Following R packages are used: tidyverse family of packages (Wickham et al., 2019), ggforce (Pedersen, 2019), ggbiplot (Vu, 2011), ggridges (Wilke, 2020), cluster (Maechler et al., 2019), pdist (Wong, 2013), StatMatch (D’Orazio, 2019), MASS (Venables and Ripley, 2002), cowplot (Wilke, 2019), corrplot (Wei and Simko, 2017), knitr (Xie, 2020a), DT (Xie et al., 2020), bookdown (Xie, 2020b), grid (R Core Team, 2020), gridExtra (Auguie, 2017), here(Müller, 2017).

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.align = "center")
set.seed(42)
library(ggforce)
library(tidyverse)
duchcov <- read_csv(here::here("data", "duchcov.csv"))

Composition analysis

composition <- duchcov %>%
  select(Li:Bi) %>%
  as.matrix()
rownames(composition) <- duchcov$id

# elements selected for PCA and further analyses
elements <- c("Co", "Ni", "Zn", "As", "Ag", "Sb", "Pb")

DT::datatable(composition[, elements], caption = "Compositions table")

Only Co, Ni, Zn, As, Ag, Sb, Pb element compositions of 49 analyzed artefacts from Duchcov hoard were used for the analyses.

corrplot::corrplot(cor(composition[, elements]),
                   type = "upper", diag = FALSE,
                   order = "hclust", method = "pie")

# values are scaled to mean 0 and sd 1
composition_biplot <- scale(composition[, elements])
# solve overplotting in variable names on a biplot
colnames(composition_biplot)[4] <- "\nAs"
colnames(composition_biplot)[2] <- "\nNi"

pc_comp <- prcomp(composition_biplot)

# First 5 PCs are used in subsequent analysis to retain close to 95% of explained variance
summary(pc_comp)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6     PC7
## Standard deviation     1.7159 1.1722 0.9935 0.9010 0.66765 0.51310 0.41717
## Proportion of Variance 0.4206 0.1963 0.1410 0.1160 0.06368 0.03761 0.02486
## Cumulative Proportion  0.4206 0.6169 0.7579 0.8739 0.93753 0.97514 1.00000
# ggbiplot::ggbiplot(pc_comp)
rm(list = c("composition_biplot"))

Grouping in the compositions

Elbow method and silhouette method are used to determine number of clusters in the dataset.

plot_scree_clusters <- function(x) {
  wss <- 0
  max_i <- (nrow(x) - 1)/4
  for (i in 1:max_i) {
    km.model <- kmeans(x, centers = i, nstart = 20)
    wss[i] <- km.model$tot.withinss
  }
  plot(1:max_i, wss, type = "b",
       xlab = "Number of Clusters",
       ylab = "Within groups sum of squares")
}

plot_scree_clusters(pc_comp$x[, 1:5])
Scree plot

Figure 1: Scree plot

plot_sil_width <- function(x) {
  sw <- 0
  max_i <- (nrow(x) - 1)/4
  for (i in 2:max_i) {
    km.model <- cluster::pam(x = pc_comp$x, k = i)
    sw[i] <- km.model$silinfo$avg.width
  }
  sw <- sw[-1]
  plot(2:max_i, sw, type = "b",
       xlab = "Number of Clusters",
       ylab = "Average silhouette width")
}

plot_sil_width(pc_comp$x[, 1:5])
Silhouettes

Figure 2: Silhouettes

# ------------------------------------------------------------------------------
# please note: to avoid confusion in changes in k-means cluster numbers/colors
# the cluster assignment was hardcoded in a csv file 'clusters.csv'
# ------------------------------------------------------------------------------
# cluster <- tibble(kmeans = kmeans(pc_comp$x[, 1:5], centers = 4, nstart = 50)$cluster) %>% 
#   mutate(kmeans = as_factor(kmeans))
# write_csv(cluster), "../data/clusters.csv")
cluster <- read_csv(here::here("data", "clusters.csv")) %>%
  mutate(kmeans = factor(kmeans))
ggplot(as_tibble(pc_comp$x), aes(PC1, PC2, fill = cluster$kmeans)) +
  geom_mark_hull(aes(color = cluster$kmeans), expand = unit(2.4, "mm")) +
  geom_point(aes(shape = cluster$kmeans, color = cluster$kmeans)) +
  scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") +
  labs(fill = "k-means\ncluster", color = "k-means\ncluster", shape = "k-means\ncluster")
K-means clusters in PCA space

Figure 3: K-means clusters in PCA space

Mean values for different element compositions in the clusters 1 to 4:

km_elements <- as_tibble(cbind(composition[, elements], cluster)) %>%
  group_by(kmeans) %>%
  summarize_all(mean) %>%
  mutate_at(vars(-kmeans), round, 4)

knitr::kable(km_elements, 
             rownames = FALSE, caption = "Mean values for element compositions")
Table 1: Mean values for element compositions
kmeans Co Ni Zn As Ag Sb Pb
1 184.7500 0.1987 0.0733 0.2133 0.0295 0.1443 2.3779
2 38.9677 0.0377 0.0548 0.0939 0.0112 0.0412 0.2274
3 0.0000 0.0263 1.8100 0.0567 0.0057 0.0370 0.1713
4 971.0000 0.0710 0.1900 0.5400 0.0303 0.0623 0.1907
rm(list = c("km_elements"))

Lead isotopes

isotopes <- duchcov %>%
  select(id, starts_with("Pb2"))

isotopes <- bind_cols(isotopes, cluster)

DT::datatable(isotopes, rownames = FALSE)
lab_206_207 <- labs(x = expression(paste(""^206, "Pb/", ""^204, "Pb")),
                    y = expression(paste(""^207, "Pb/", ""^204, "Pb")))
lab_206_208 <- labs(x = expression(paste(""^206, "Pb/", ""^204, "Pb")),
                    y = expression(paste(""^208, "Pb/", ""^204, "Pb")))
lab_207_208 <- labs(x = expression(paste(""^207, "Pb/", ""^206, "Pb")),
                    y = expression(paste(""^208, "Pb/", ""^206, "Pb")))
ggplot(isotopes, aes(Pb206_204, Pb207_204)) +
  geom_point(aes(color = kmeans, shape = kmeans), size = 1.6) +
  scale_color_brewer(palette = "Set1", name = "k-means\ncluster") +
  labs(shape = "k-means\ncluster") + lab_206_207
K-means clusters in isospace

Figure 4: K-means clusters in isospace

ggplot(isotopes, aes(Pb206_204, Pb208_204)) +
  geom_point(aes(color = kmeans, shape = kmeans), size = 1.6) +
  scale_color_brewer(palette = "Set1", name = "k-means\ncluster") +
  labs(shape = "k-means\ncluster") + lab_206_208
K-means clusters in isospace

Figure 5: K-means clusters in isospace

ggplot(isotopes, aes(Pb207_206, Pb208_206)) +
  geom_point(aes(color = kmeans, shape = kmeans), size = 1.6) +
  scale_color_brewer(palette = "Set1", name = "k-means\ncluster") +
  labs(shape = "k-means\ncluster") + lab_207_208
K-means clusters in isospace

Figure 6: K-means clusters in isospace

rm(list = c("plot_scree_clusters", "plot_sil_width"))

Ore sources

Here we compare the distributions of Duchcov k-means clusters with distributions of known sources based on lead isotopes.

sources <- read_csv(here::here("data", "ore_sources.csv")) %>%
  mutate(Region = factor(Region,
                         levels = c("Valais", "Rheinland", "Saarland",
                                    "Rammelsberg & Harz",
                                    "E Alps", "SE Alps", "Erzgebirge", 
                                    "Slovakia", "E Carpathians")),
         Region2 = factor(Region2,
                          levels = c("Valais", "Rheinland", "Saarland",
                                     "Rammelsberg", "Harz",
                                     "E Alps", "SE Alps", "Erzgebirge", 
                                     "Slovakia", "E Carpathians")))

# frame margins of further plots
dist_duchcov <- vector(mode = "list")
dist_duchcov$Pb206x$min <-  min(isotopes$Pb206_204, na.rm = TRUE)
dist_duchcov$Pb206x$max <-  max(isotopes$Pb206_204, na.rm = TRUE)
dist_duchcov$Pb207y$min <-  min(isotopes$Pb207_204, na.rm = TRUE)
dist_duchcov$Pb207y$max <-  max(isotopes$Pb207_204, na.rm = TRUE)
dist_duchcov$Pb208_204y$min <-  min(isotopes$Pb208_204, na.rm = TRUE)
dist_duchcov$Pb208_204y$max <-  max(isotopes$Pb208_204, na.rm = TRUE)

dist_duchcov$Pb208y$min <-  min(isotopes$Pb208_206, na.rm = TRUE)
dist_duchcov$Pb208y$max <-  max(isotopes$Pb208_206, na.rm = TRUE)
dist_duchcov$Pb207x$min <-  min(isotopes$Pb207_206, na.rm = TRUE)
dist_duchcov$Pb207x$max <-  max(isotopes$Pb207_206, na.rm = TRUE)

Points removed as outliers are highlighted in red in the following figures. Outliers are defined as points lying further than 2 interquartile ranges from the upper quartile of the Mahalanobis distances from the data distribution center in a multidimensional space defined by isotopic signals \(^{206}Pb/^{204}Pb\), \(^{207}Pb/^{204}Pb\), \(^{208}Pb/^{204}Pb\), \(^{207}Pb/^{206}Pb\) and \(^{208}Pb/^{206}Pb\).

ggplot(sources, aes(Pb206_204, Pb207_204)) +
  geom_point(aes(color = Outlier), alpha = 0.4) +
  facet_wrap(~Region, scales = "free") +
  scale_color_manual(values = c("forestgreen", "red")) +
  lab_206_207
Ore sources outliers

Figure 7: Ore sources outliers

ggplot(sources, aes(Pb206_204, Pb208_204)) +
  geom_point(aes(color = Outlier), alpha = 0.4) +
  facet_wrap(~Region, scales = "free") +
  scale_color_manual(values = c("forestgreen", "red")) +
  lab_206_208
Ore sources outliers

Figure 8: Ore sources outliers

ggplot(sources, aes(Pb207_206, Pb208_206)) +
  geom_point(aes(color = Outlier), alpha = 0.4) +
  facet_wrap(~Region, scales = "free") +
  scale_color_manual(values = c("forestgreen", "red")) +
  lab_207_208
Ore sources outliers

Figure 9: Ore sources outliers

sources <- sources %>% filter(Outlier == FALSE)
gg_206_207 <- ggplot(sources, aes(Pb206_204, Pb207_204)) +
  lab_206_207

gg_206_208 <- ggplot(sources, aes(Pb206_204, Pb208_204)) +
  lab_206_208

gg_207_208 <- ggplot(sources, aes(Pb207_206, Pb208_206)) +
  lab_207_208

Overall view of distributions of lead isotope values.

gg_206_207 +
  geom_point(alpha = 0.2, aes(color = Region)) +
  scale_color_viridis_d(direction = -1, end = 0.9) +
  annotate("rect",
           xmin = dist_duchcov$Pb206x$min, xmax = dist_duchcov$Pb206x$max,
           ymin = dist_duchcov$Pb207y$min, ymax = dist_duchcov$Pb207y$max, alpha = 0.2) +
  geom_point(data = isotopes, aes(Pb206_204, Pb207_204), shape = "cross")
Ore sources overlayed with Duchcov hoard data

Figure 10: Ore sources overlayed with Duchcov hoard data

gg_206_208 +
  geom_point(alpha = 0.2, aes(color = Region)) +
  scale_color_viridis_d(direction = -1, end = 0.9) +
  annotate("rect",
           xmin = dist_duchcov$Pb206x$min, xmax = dist_duchcov$Pb206x$max,
           ymin = dist_duchcov$Pb208_204y$min, ymax = dist_duchcov$Pb208_204y$max, alpha = 0.2) +
  geom_point(data = isotopes, aes(Pb206_204, Pb208_204), shape = "cross")
Ore sources overlayed with Duchcov hoard data

Figure 11: Ore sources overlayed with Duchcov hoard data

gg_207_208 +
  geom_point(alpha = 0.2, aes(color = Region)) +
  scale_color_viridis_d(direction = -1, end = 0.9) +
  annotate("rect",
           xmin = dist_duchcov$Pb207x$min, xmax = dist_duchcov$Pb207x$max,
           ymin = dist_duchcov$Pb208y$min, ymax = dist_duchcov$Pb208y$max, alpha = 0.2) +
  geom_point(data = isotopes, aes(Pb207_206, Pb208_206), shape = "cross")
Ore sources overlayed with Duchcov hoard data

Figure 12: Ore sources overlayed with Duchcov hoard data

gg_206_207 <- gg_206_207 + 
  coord_cartesian(xlim = c(dist_duchcov$Pb206x$min, dist_duchcov$Pb206x$max),
                  ylim = c(dist_duchcov$Pb207y$min, dist_duchcov$Pb207y$max)) +
  guides(shape = guide_legend(ncol=2), color = guide_legend(ncol = 2)) +
  labs(shape = "K-means cluster")

gg_206_208 <- gg_206_208 + 
  coord_cartesian(xlim = c(dist_duchcov$Pb206x$min, dist_duchcov$Pb206x$max),
                  ylim = c(dist_duchcov$Pb208_204y$min, dist_duchcov$Pb208_204y$max)) +
  guides(shape = guide_legend(ncol=2), color = guide_legend(ncol = 2)) +
  labs(shape = "K-means cluster")

gg_207_208 <- gg_207_208 + 
  coord_cartesian(xlim = c(dist_duchcov$Pb207x$min, dist_duchcov$Pb207x$max),
                  ylim = c(dist_duchcov$Pb208y$min, dist_duchcov$Pb208y$max)) +
  guides(shape = guide_legend(ncol=2), color = guide_legend(ncol = 2)) +
  labs(shape = "K-means cluster")

Several visual methods are explored in order to determine whether the lead isotopic signal from Duchcov hoard objects grouped by k-means clustering correspond to known sources. At first, areas corresponding to lead isotope signals from different geographic regions are constructed in isospaces using concave hulls (Gombin et al., 2017, concaveman algorithm) of the point distributions for given geographic regions.

# gg_206_207 +
#   geom_point(alpha = 0.2, aes(color = Region2), show.legend = FALSE) +
#   geom_mark_hull(aes(fill = Region2, color = Region2), expand = unit(2.4, "mm"),
#                  alpha = 0.1, show.legend = FALSE) +
#   scale_color_viridis_d(direction = -1, end = 0.9) + scale_fill_viridis_d(direction = -1) +
#   geom_point(data = isotopes, aes(shape = kmeans), alpha = 0.6) +
#   facet_wrap(~Region)
# 
# gg_206_208 +
#   geom_point(alpha = 0.2, aes(color = Region2), show.legend = FALSE) +
#   geom_mark_hull(aes(fill = Region2, color = Region2), expand = unit(2.4, "mm"),
#                  alpha = 0.1, show.legend = FALSE) +
#   scale_color_viridis_d(direction = -1, end = 0.9) + scale_fill_viridis_d(direction = -1) +
#   geom_point(data = isotopes, aes(shape = kmeans), alpha = 0.6) +
#   facet_wrap(~Region)
# 
# gg_207_208 +
#   geom_point(alpha = 0.2, aes(color = Region2), show.legend = FALSE) +
#   geom_mark_hull(aes(fill = Region2, color = Region2), expand = unit(2.4, "mm"),
#                  alpha = 0.1, show.legend = FALSE) +
#   scale_color_viridis_d(direction = -1, end = 0.9) + scale_fill_viridis_d(direction = -1) +
#   geom_point(data = isotopes, aes(shape = kmeans), alpha = 0.6) +
#   facet_wrap(~Region)

Kernel density estimation for each of the source regions helps in identifying to what extent the point distribution of Duchcov hoard objects fits the density distribution in given regions. kde2d algorithm of MASS package is used (Venables and Ripley, 2002).

gg_206_207 +
  stat_density2d(aes(fill = stat(nlevel)), geom = "polygon", alpha = 0.2, show.legend = FALSE) +
  scale_fill_gradient(low = "white", high = "black") +
  geom_point(alpha = 0.2, size = 0.2, color = "black") +
  geom_point(data = isotopes, aes(shape = kmeans, color = kmeans)) +
  scale_color_brewer(palette = "Set1", name = "K-means cluster") +
  facet_wrap(~ Region)
Kernel density estimation of ore sources

Figure 13: Kernel density estimation of ore sources

gg_206_208 +
  stat_density2d(aes(fill = stat(nlevel)), geom = "polygon", alpha = 0.2, show.legend = FALSE) +
  scale_fill_gradient(low = "white", high = "black") +
  geom_point(alpha = 0.2, size = 0.2, color = "black") +
  geom_point(data = isotopes, aes(shape = kmeans, color = kmeans)) +
  scale_color_brewer(palette = "Set1", name = "K-means cluster") +
  facet_wrap(~ Region)
Kernel density estimation of ore sources

Figure 14: Kernel density estimation of ore sources

gg_207_208 +
  stat_density2d(aes(fill = stat(nlevel)), geom = "polygon", alpha = 0.2, show.legend = FALSE) +
  scale_fill_gradient(low = "white", high = "black") +
  geom_point(alpha = 0.2, size = 0.2, color = "black") +
  geom_point(data = isotopes, aes(shape = kmeans, color = kmeans)) +
  scale_color_brewer(palette = "Set1", name = "K-means cluster") +
  facet_wrap(~ Region)
Kernel density estimation of ore sources

Figure 15: Kernel density estimation of ore sources

rm(list = c("dist_duchcov"))

Euclidean distances

Euclidean distance between all the points in given geographic region and k-means clusters from Duchcov hoard is used as one of the proxies to identify possible sources of the lead isotope ratios \(^{208}Pb/^{204}Pb\), \(^{207}Pb/^{204}Pb\), \(^{206}Pb/^{204}Pb\), \(^{208}Pb/^{206}Pb\) and \(^{207}Pb/^{206}Pb\).

The problem with using Euclidean distances is that the multivariate shape of the data distribution is not taken into account, thus Mahalanobis distance (see further) seems as a better fit for the task, even though Euclidean distances are commonly used (e.g. Ling et al., 2014).

nest_matrix <- function(df, gr) {
  output <- df %>% group_by(!!sym(gr)) %>%
    nest() %>% 
    mutate(mx = map(data, as.matrix))
  return(output)
}

point_cloud_distance <- function(origin, goal, group_origin, group_goal, method) {
  # create nested matrices
  x <- nest_matrix(origin, group_origin)
  y <- nest_matrix(goal, group_goal)
  lx <- nrow(x)
  ly <- nrow(y)
  # create empty list for results
  distance <- vector(mode = "list", length = ly)
  for (k in seq_along(distance)) {
    distance[[k]] <- vector(mode = "list", length = lx)
    names(distance[[k]]) <- x %>% pull(!!sym(group_origin))
  }
  names(distance) <- y %>% pull(!!sym(group_goal)) %>% str_c("K-means cluster ", .)
  # count euclidean distances
  if (method == "euclidean") {
    for (i in 1:ly) {
      for (j in 1:lx) {
        distance[[i]][[j]] <- pdist::pdist(X = y$mx[[i]], Y = x$mx[[j]])@dist
      }
      distance[[i]] <- bind_rows(lapply(distance[[i]], as_tibble), .id = "region")
    }
    # mahalanobis distance
  } else if (method == "mahalanobis") {
    for (i in 1:ly) {
      for (j in 1:lx) {
        distance[[i]][[j]] <- StatMatch::mahalanobis.dist(data.x = y$mx[[i]],
                                                          data.y = x$mx[[j]])
      }
      distance[[i]] <- bind_rows(
        lapply(lapply(distance[[i]], as.vector), as_tibble), .id = "region")
    }
  }
  distance <- bind_rows(distance, .id = "kmeans")
  return(distance)
}
src <- sources %>% select(Region2, starts_with("Pb"))
iso <- isotopes %>% select(kmeans, starts_with("Pb")) %>% na.omit()

euclidean_distance <- point_cloud_distance(origin = src, goal = iso, 
                                           group_origin = "Region2", group_goal = "kmeans",
                                           method = "euclidean") %>% 
  mutate(region = fct_relevel(region, levels(sources$Region2)))
# ggplot(euclidean_distance, aes(y = region, x = value, fill = region)) +
#   ggridges::geom_density_ridges(alpha = 0.4, show.legend = FALSE, scale = 1.1) +
#   scale_fill_viridis_d(direction = -1, end = 0.9) +
#   coord_cartesian(xlim = c(0, 1)) +
#   facet_wrap(~kmeans, scales = "fixed") +
#   labs(y = "Region",
#        x = "Euclidean distance to a given Duchcov K-means cluster (x axis limited to interval 0 - 1)")

Mahalanobis distances

Mahalanobis distance is used as a metric to compare points fit in a given source distribution. The scale differences and effects of correlation between the variables are removed in case of Mahalanobis distance.

mahalanobis_distance <- point_cloud_distance(origin = src, goal = iso, 
                                             group_origin = "Region2", group_goal = "kmeans",
                                             method = "mahalanobis") %>% 
  mutate(region = fct_relevel(region, levels(sources$Region2)))
# ggplot(mahalanobis_distance, aes(y = region, x = value, fill = region)) +
#   ggridges::geom_density_ridges(alpha = 0.4, show.legend = FALSE, scale = 1.1) +
#   scale_fill_viridis_d(direction = -1, end = 0.9) +
#   coord_cartesian(xlim = c(0, 10)) +
#   facet_wrap(~kmeans, scales = "fixed") +
#   labs(y = "Region",
#        x = "Mahalanobis distance to a given Duchcov K-means cluster (x axis limited to interval 0 - 10)")
rm(list = c("iso", "src", "nest_matrix", "point_cloud_distance"))

Discriminant analysis

Manova is used in order to determine whether there are significant differences between group means for the distinct regions.

sources_lm <- lm(as.matrix(sources[, c("Pb207_206", "Pb208_206",
                                       "Pb206_204", "Pb207_204", "Pb208_204")]) ~ sources$Region2)

summary(sources_lm)
## Response Pb207_206 :
## 
## Call:
## lm(formula = Pb207_206 ~ sources$Region2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.072955 -0.002791 -0.000101  0.002776  0.051416 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.8388941  0.0010631 789.082  < 2e-16 ***
## sources$Region2Rheinland      0.0111263  0.0011240   9.899  < 2e-16 ***
## sources$Region2Saarland       0.0124048  0.0013333   9.304  < 2e-16 ***
## sources$Region2Rammelsberg    0.0166259  0.0013021  12.769  < 2e-16 ***
## sources$Region2Harz           0.0076641  0.0011633   6.588 6.14e-11 ***
## sources$Region2E Alps        -0.0265394  0.0013333 -19.905  < 2e-16 ***
## sources$Region2SE Alps        0.0211837  0.0013381  15.831  < 2e-16 ***
## sources$Region2Erzgebirge     0.0134693  0.0013679   9.847  < 2e-16 ***
## sources$Region2Slovakia       0.0003137  0.0014579   0.215 0.829668    
## sources$Region2E Carpathians -0.0056094  0.0014480  -3.874 0.000112 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.008166 on 1500 degrees of freedom
## Multiple R-squared:  0.6415, Adjusted R-squared:  0.6394 
## F-statistic: 298.3 on 9 and 1500 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb208_206 :
## 
## Call:
## lm(formula = Pb208_206 ~ sources$Region2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.121284 -0.003497  0.000101  0.004379  0.099886 
## 
## Coefficients:
##                               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                   2.070344   0.001880 1101.117  < 2e-16 ***
## sources$Region2Rheinland      0.015886   0.001988    7.992 2.63e-15 ***
## sources$Region2Saarland       0.018247   0.002358    7.738 1.84e-14 ***
## sources$Region2Rammelsberg    0.016893   0.002303    7.336 3.59e-13 ***
## sources$Region2Harz           0.011947   0.002057    5.807 7.75e-09 ***
## sources$Region2E Alps        -0.031269   0.002358  -13.260  < 2e-16 ***
## sources$Region2SE Alps        0.038589   0.002367   16.306  < 2e-16 ***
## sources$Region2Erzgebirge     0.022512   0.002419    9.305  < 2e-16 ***
## sources$Region2Slovakia       0.006566   0.002578    2.547   0.0110 *  
## sources$Region2E Carpathians -0.006388   0.002561   -2.495   0.0127 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01444 on 1500 degrees of freedom
## Multiple R-squared:  0.5075, Adjusted R-squared:  0.5045 
## F-statistic: 171.7 on 9 and 1500 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb206_204 :
## 
## Call:
## lm(formula = Pb206_204 ~ sources$Region2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20680 -0.07042  0.00256  0.06460  2.03536 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  18.68180    0.02641 707.475  < 2e-16 ***
## sources$Region2Rheinland     -0.30778    0.02792 -11.024  < 2e-16 ***
## sources$Region2Saarland      -0.33292    0.03312 -10.053  < 2e-16 ***
## sources$Region2Rammelsberg   -0.44180    0.03234 -13.661  < 2e-16 ***
## sources$Region2Harz          -0.22234    0.02889  -7.695 2.55e-14 ***
## sources$Region2E Alps         0.68584    0.03312  20.710  < 2e-16 ***
## sources$Region2SE Alps       -0.46861    0.03324 -14.099  < 2e-16 ***
## sources$Region2Erzgebirge    -0.37961    0.03398 -11.173  < 2e-16 ***
## sources$Region2Slovakia      -0.01786    0.03621  -0.493  0.62201    
## sources$Region2E Carpathians  0.10916    0.03597   3.035  0.00245 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2028 on 1500 degrees of freedom
## Multiple R-squared:  0.6543, Adjusted R-squared:  0.6523 
## F-statistic: 315.5 on 9 and 1500 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb207_204 :
## 
## Call:
## lm(formula = Pb207_204 ~ sources$Region2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.130320 -0.009828 -0.001337  0.008172  0.120172 
## 
## Coefficients:
##                               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                  15.667000   0.002466 6351.939  < 2e-16 ***
## sources$Region2Rheinland     -0.049172   0.002608  -18.857  < 2e-16 ***
## sources$Region2Saarland      -0.046937   0.003093  -15.174  < 2e-16 ***
## sources$Region2Rammelsberg   -0.062360   0.003021  -20.644  < 2e-16 ***
## sources$Region2Harz          -0.040025   0.002699  -14.830  < 2e-16 ***
## sources$Region2E Alps         0.053433   0.003093   17.274  < 2e-16 ***
## sources$Region2SE Alps       -0.004218   0.003104   -1.359  0.17446    
## sources$Region2Erzgebirge    -0.067630   0.003174  -21.310  < 2e-16 ***
## sources$Region2Slovakia      -0.005942   0.003382   -1.757  0.07918 .  
## sources$Region2E Carpathians -0.009246   0.003359   -2.752  0.00599 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01895 on 1500 degrees of freedom
## Multiple R-squared:  0.7194, Adjusted R-squared:  0.7177 
## F-statistic: 427.2 on 9 and 1500 DF,  p-value: < 2.2e-16
## 
## 
## Response Pb208_204 :
## 
## Call:
## lm(formula = Pb208_204 ~ sources$Region2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.15273 -0.08994  0.01020  0.09711  1.75943 
## 
## Coefficients:
##                              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                  38.66768    0.02883 1341.117  < 2e-16 ***
## sources$Region2Rheinland     -0.33674    0.03048  -11.047  < 2e-16 ***
## sources$Region2Saarland      -0.34485    0.03616   -9.537  < 2e-16 ***
## sources$Region2Rammelsberg   -0.59640    0.03531  -16.889  < 2e-16 ***
## sources$Region2Harz          -0.22972    0.03155   -7.281 5.32e-13 ***
## sources$Region2E Alps         0.80363    0.03616   22.225  < 2e-16 ***
## sources$Region2SE Alps       -0.26015    0.03629   -7.169 1.18e-12 ***
## sources$Region2Erzgebirge    -0.36471    0.03710   -9.831  < 2e-16 ***
## sources$Region2Slovakia       0.09271    0.03954    2.345  0.01917 *  
## sources$Region2E Carpathians  0.11509    0.03927    2.931  0.00343 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2215 on 1500 degrees of freedom
## Multiple R-squared:  0.6736, Adjusted R-squared:  0.6716 
## F-statistic:   344 on 9 and 1500 DF,  p-value: < 2.2e-16
summary(manova(sources_lm), test = "Pillai")
##                   Df Pillai approx F num Df den Df    Pr(>F)    
## sources$Region2    9  1.663   83.061     45   7500 < 2.2e-16 ***
## Residuals       1500                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(manova(sources_lm), test = "Wilks")
# summary(manova(sources_lm), test = "Roy")
# summary(manova(sources_lm), test = "Hotelling-Lawley")

Linear discriminant analysis is then used to separate the regions.

n_levels_sources <- length(levels(sources$Region2))

sources_lda <- MASS::lda(Region2 ~ ., sources[, c("Region2", 
                                                  "Pb207_206", "Pb208_206", 
                                                  "Pb206_204", "Pb207_204", "Pb208_204")],
                         prior = as.double(
                           paste(
                             rep(1/n_levels_sources, 
                                 n_levels_sources), sep = ",")))
sources_lda
## Call:
## lda(Region2 ~ ., data = sources[, c("Region2", "Pb207_206", "Pb208_206", 
##     "Pb206_204", "Pb207_204", "Pb208_204")], prior = as.double(paste(rep(1/n_levels_sources, 
##     n_levels_sources), sep = ",")))
## 
## Prior probabilities of groups:
##        Valais     Rheinland      Saarland   Rammelsberg          Harz 
##           0.1           0.1           0.1           0.1           0.1 
##        E Alps       SE Alps    Erzgebirge      Slovakia E Carpathians 
##           0.1           0.1           0.1           0.1           0.1 
## 
## Group means:
##               Pb207_206 Pb208_206 Pb206_204 Pb207_204 Pb208_204
## Valais        0.8388941  2.070344  18.68180  15.66700  38.66768
## Rheinland     0.8500203  2.086230  18.37402  15.61783  38.33094
## Saarland      0.8512989  2.088591  18.34888  15.62006  38.32283
## Rammelsberg   0.8555200  2.087237  18.23999  15.60464  38.07128
## Harz          0.8465582  2.082291  18.45946  15.62697  38.43796
## E Alps        0.8123547  2.039076  19.36764  15.72043  39.47131
## SE Alps       0.8600777  2.108933  18.21319  15.66278  38.40752
## Erzgebirge    0.8523633  2.092856  18.30219  15.59937  38.30297
## Slovakia      0.8392078  2.076910  18.66394  15.66106  38.76039
## E Carpathians 0.8332847  2.063956  18.79096  15.65775  38.78277
## 
## Coefficients of linear discriminants:
##                  LD1        LD2        LD3        LD4        LD5
## Pb207_206 101.121423  107.21498 -6104.6626 -209.63829  3200.0906
## Pb208_206 144.347387 -643.57946  3427.6488  885.32589 -2501.5917
## Pb206_204  18.885003  -58.93502   102.8955   84.32811  -136.1844
## Pb207_204 -38.374191  -55.15135   305.3234  -28.64439  -168.6860
## Pb208_204  -8.793357   33.19614  -177.9727  -40.71934   136.1671
## 
## Proportion of trace:
##    LD1    LD2    LD3    LD4    LD5 
## 0.6157 0.2535 0.0808 0.0460 0.0040

Using the LDA to predict the regions of original points, we get a lot of error, see the confusion matrix below. How to read it: rows are the regions the point is from and columns are regions predicted based on the LDA.

sources_pred <- predict(sources_lda)
prediction <- with(sources_pred, data.frame(Region = sources$Region2,
                                            Predict = class, round(posterior, 2)))

knitr::kable(addmargins(xtabs(~fct_drop(Region) + fct_drop(Predict), prediction)), 
             caption = "Confusion matrix")
Table 2: Confusion matrix
Valais Rheinland Saarland Rammelsberg Harz E Alps SE Alps Erzgebirge Slovakia E Carpathians Sum
Valais 28 0 4 0 2 1 4 0 7 13 59
Rheinland 8 129 85 73 117 0 11 49 9 20 501
Saarland 1 27 41 7 23 0 1 2 1 0 103
Rammelsberg 0 3 14 97 0 0 0 4 0 0 118
Harz 34 35 10 0 195 0 0 15 10 0 299
E Alps 28 0 0 0 0 60 2 0 12 1 103
SE Alps 8 0 1 0 0 0 90 0 2 0 101
Erzgebirge 0 13 2 3 10 0 1 60 1 0 90
Slovakia 9 0 3 0 4 0 7 0 31 13 67
E Carpathians 4 0 0 0 3 0 0 0 18 44 69
Sum 120 207 160 180 354 61 116 130 91 91 1510
# overall success of the model in predicting correct region
round(mean(prediction$Region == prediction$Predict), 2)
## [1] 0.51
# individual correct predictions (diagonal)
round(prop.table(table(droplevels(prediction$Region),
                       droplevels(prediction$Predict)),
                 margin = 1), 2)
##                
##                 Valais Rheinland Saarland Rammelsberg Harz E Alps SE Alps
##   Valais          0.47      0.00     0.07        0.00 0.03   0.02    0.07
##   Rheinland       0.02      0.26     0.17        0.15 0.23   0.00    0.02
##   Saarland        0.01      0.26     0.40        0.07 0.22   0.00    0.01
##   Rammelsberg     0.00      0.03     0.12        0.82 0.00   0.00    0.00
##   Harz            0.11      0.12     0.03        0.00 0.65   0.00    0.00
##   E Alps          0.27      0.00     0.00        0.00 0.00   0.58    0.02
##   SE Alps         0.08      0.00     0.01        0.00 0.00   0.00    0.89
##   Erzgebirge      0.00      0.14     0.02        0.03 0.11   0.00    0.01
##   Slovakia        0.13      0.00     0.04        0.00 0.06   0.00    0.10
##   E Carpathians   0.06      0.00     0.00        0.00 0.04   0.00    0.00
##                
##                 Erzgebirge Slovakia E Carpathians
##   Valais              0.00     0.12          0.22
##   Rheinland           0.10     0.02          0.04
##   Saarland            0.02     0.01          0.00
##   Rammelsberg         0.03     0.00          0.00
##   Harz                0.05     0.03          0.00
##   E Alps              0.00     0.12          0.01
##   SE Alps             0.00     0.02          0.00
##   Erzgebirge          0.67     0.01          0.00
##   Slovakia            0.00     0.46          0.19
##   E Carpathians       0.00     0.26          0.64

Prediction of Regions based on data on points from Duchcov hoard.

duchcov_prediction <- predict(object = sources_lda, newdata = isotopes[, -7])
duchcov_prediction_df <- with(duchcov_prediction,
                              data.frame(id = isotopes[, "id"],
                                         kmeans = isotopes[, "kmeans"],
                                         predicted_region = class,
                                         round(posterior, 2)))

# posterior probabilities of individual predictions
DT::datatable(duchcov_prediction_df,
              caption = "Posterior probabilities of predicted sources")
duchcov_prediction_df %>% 
  select(-predicted_region) %>% 
  gather(key = "region", value = "probability", -id, -kmeans) %>% 
  group_by(kmeans, region) %>% 
  dplyr::summarize(mean_prob = mean(probability, na.rm = TRUE)) %>% 
  ggplot(aes(x = region, y = mean_prob)) +
  geom_col(fill = "white", col = "black") +
  facet_wrap(~kmeans, ncol = 1) +
  labs(x = "Region", y = "Mean posterior probability")

# matrix of counts of points in individual k-means clusters (rows)
# fitting to various regions (columns)
knitr::kable(addmargins(xtabs(~kmeans + fct_drop(predicted_region),
                              duchcov_prediction_df)), 
             caption = "Counts of points per different sources")
Table 3: Counts of points per different sources
Valais Rheinland Saarland Rammelsberg Harz SE Alps Erzgebirge Slovakia E Carpathians Sum
1 4 0 0 1 1 1 0 4 0 11
2 0 1 6 4 0 0 12 7 1 31
3 0 0 1 0 0 0 2 0 0 3
4 0 0 0 0 1 0 1 1 0 3
Sum 4 1 7 5 2 1 15 12 1 48
ggplot(isotopes, aes(Pb206_204, Pb207_204, label = kmeans)) +
  geom_text(aes(color = duchcov_prediction_df$predicted_region, 
                shape = duchcov_prediction_df$predicted_region)) +
  scale_color_brewer(palette = "Set1", name = "predicted\nregion") +
  labs(shape = "predicted\nregion") + lab_206_207
Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Figure 16: Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

ggplot(isotopes, aes(Pb206_204, Pb208_204, label = kmeans)) +
  geom_text(aes(color = duchcov_prediction_df$predicted_region, 
                shape = duchcov_prediction_df$predicted_region)) +
  scale_color_brewer(palette = "Set1", name = "predicted\nregion") +
  labs(shape = "predicted\nregion") + lab_206_208
Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Figure 17: Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

ggplot(isotopes, aes(Pb207_206, Pb208_206, label = kmeans)) +
  geom_text(aes(color = duchcov_prediction_df$predicted_region, 
                shape = duchcov_prediction_df$predicted_region)) +
  scale_color_brewer(palette = "Set1", name = "predicted\nregion") +
  labs(shape = "predicted\nregion") + lab_207_208
Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

Figure 18: Source regions as assigned to Duchcov k-means clusters (numeric labels) by LDA

rm(list = c("med_d_all", "med_d_regions", 
            "euclidean_distance", "mahalanobis_distance",
            "filtered_regions", "sources_subset", 
            "sources_lm", "sources_lda", "sources_pred",
            "n_levels_sources", "prediction",
            "duchcov_prediction", "duchcov_prediction_df"))

Burial grounds

burials <- read_csv(here::here("data", "burial_grounds.csv"))
duchcov <- duchcov %>% transmute(brooch = if_else(str_detect(type_eng, "^brooch_"),
                                                  "brooch", "not brooch"),
                                 brooch = factor(brooch, levels = c("brooch", "not brooch")),
                                 id = id) %>%
  left_join(isotopes)

burials <- burials %>% mutate(brooch = if_else(str_detect(type_eng, "brooch_"),
                                               "brooch", "not brooch"),
                              brooch = factor(brooch, levels = c("brooch", "not brooch"))) %>% 
  select(-type_eng)
g_bur_206_207 <- ggplot(burials, aes(Pb206_204, Pb207_204)) +
  labs(color = "Burials", shape = "Duchcov k-means\ncluster") +
  lab_206_207 +
  theme(legend.position = c(0.84, 0.2)) +
  guides(shape = guide_legend(ncol=2), color = guide_legend(ncol = 2))
g_bur_206_208 <- ggplot(burials, aes(Pb206_204, Pb208_204)) +
  labs(color = "Burials", shape = "Duchcov k-means\ncluster") +
  lab_206_208 +
  theme(legend.position = c(0.84, 0.2)) +
  guides(shape = guide_legend(ncol=2), color = guide_legend(ncol = 2))
g_bur_207_208 <- ggplot(burials, aes(Pb207_206, Pb208_206)) +
  labs(color = "Burials", shape = "Duchcov k-means\ncluster") +
  lab_207_208 +
  theme(legend.position = c(0.84, 0.2)) +
  guides(shape = guide_legend(ncol=2), color = guide_legend(ncol = 2))

Convex hulls of isotope signals of artefacts from burials

g_bur_206_207 + geom_point(aes(color = site), alpha = 0.8, show.legend = FALSE) +
  geom_mark_hull(aes(color = site, fill = site), expand = unit(2.4, "mm"),
                 alpha = 0.1, show.legend = FALSE) +
  scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") +
  geom_point(data = duchcov, aes(shape = kmeans), size = 2, alpha = 0.4) +
  facet_wrap(~site)
Distribution of data from different burials (LTB1 horizon) overlayed with Duchcov hoard

Figure 19: Distribution of data from different burials (LTB1 horizon) overlayed with Duchcov hoard

g_bur_206_208 + geom_point(aes(color = site), alpha = 0.8, show.legend = FALSE) +
  geom_mark_hull(aes(color = site, fill = site), expand = unit(2.4, "mm"),
                 alpha = 0.1, show.legend = FALSE) +
  scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") +
  geom_point(data = duchcov, aes(shape = kmeans), size = 2, alpha = 0.4) +
  facet_wrap(~site)
Distribution of data from different burials (LTB1 horizon) overlayed with Duchcov hoard

Figure 20: Distribution of data from different burials (LTB1 horizon) overlayed with Duchcov hoard

g_bur_207_208 + geom_point(aes(color = site), alpha = 0.8, show.legend = FALSE) +
  geom_mark_hull(aes(color = site, fill = site), expand = unit(2.4, "mm"),
                 alpha = 0.1, show.legend = FALSE) +
  scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") +
  geom_point(data = duchcov, aes(shape = kmeans), size = 2, alpha = 0.4) +
  facet_wrap(~site)
Distribution of data from different burials (LTB1 horizon) overlayed with Duchcov hoard

Figure 21: Distribution of data from different burials (LTB1 horizon) overlayed with Duchcov hoard

Kernel density estimates

# g_bur_206_207 +
#   stat_density2d(aes(fill = stat(nlevel)), geom = "polygon", alpha = 0.2, show.legend = FALSE) +
#   scale_fill_gradient(low = "white", high = "black") +
#   scale_color_brewer(palette = "Set1") +
#   geom_point(aes(color = site), alpha = 0.8, show.legend = FALSE) +
#   geom_point(data = duchcov, aes(shape = kmeans), size = 1.4, alpha = 0.4) +
#   facet_wrap(~site)
# 
# g_bur_206_208 +
#   stat_density2d(aes(fill = stat(nlevel)), geom = "polygon", alpha = 0.2, show.legend = FALSE) +
#   scale_fill_gradient(low = "white", high = "black") +
#   scale_color_brewer(palette = "Set1") +
#   geom_point(aes(color = site), alpha = 0.8, show.legend = FALSE) +
#   geom_point(data = duchcov, aes(shape = kmeans), size = 1.4, alpha = 0.4) +
#   facet_wrap(~site)
# 
# g_bur_207_208 +
#   stat_density2d(aes(fill = stat(nlevel)), geom = "polygon", alpha = 0.2, show.legend = FALSE) +
#   scale_fill_gradient(low = "white", high = "black") +
#   scale_color_brewer(palette = "Set1") +
#   geom_point(aes(color = site), alpha = 0.8, show.legend = FALSE) +
#   geom_point(data = duchcov, aes(shape = kmeans), size = 1.4, alpha = 0.4) +
#   facet_wrap(~site)

Density distributions of burials and Duchcov hoard artefacts

library(cowplot)
origin <- bind_rows(duchcov = duchcov[, c("Pb207_204", "Pb206_204", 
                                          "Pb208_204", "Pb208_206", 
                                          "Pb207_206", "brooch")],
                    burials = burials[, c("Pb207_204", "Pb206_204", 
                                          "Pb208_204", "Pb208_206", 
                                          "Pb207_206", "brooch")],
                    .id = "origin")

Distribution of brooches

g_bro_206_207 <- ggplot(origin, aes(Pb206_204, Pb207_204, color = origin)) +
  facet_wrap(~ brooch) +
  labs(color = "Origin") +
  lab_206_207
g_bro_206_208 <- ggplot(origin, aes(Pb206_204, Pb208_204, color = origin)) +
  labs(color = "Origin") +
  facet_wrap(~ brooch) +
  lab_206_208
g_bro_207_208 <- ggplot(origin, aes(Pb207_206, Pb208_206, color = origin)) +
  labs(color = "Origin") +
  facet_wrap(~ brooch) +
  lab_207_208
g_bro_206_207 +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)
Distribution of brooches and other artefact types in burials and Duchcov hoard

Figure 22: Distribution of brooches and other artefact types in burials and Duchcov hoard

g_bro_206_208 +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)
Distribution of brooches and other artefact types in burials and Duchcov hoard

Figure 23: Distribution of brooches and other artefact types in burials and Duchcov hoard

g_bro_207_208 +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)
Distribution of brooches and other artefact types in burials and Duchcov hoard

Figure 24: Distribution of brooches and other artefact types in burials and Duchcov hoard

References

Auguie, B., 2017. GridExtra: Miscellaneous functions for “grid” graphics.

D’Orazio, M., 2019. StatMatch: Statistical matching or data fusion.

Gombin, J., Vaidyanathan, R., Agafonkin, V., 2017. Concaveman: A very fast 2D concave hull algorithm.

Ling, J., Stos-Gale, Z., Grandin, L., Billström, K., Hjärthner-Holdar, E., Persson, P.-O., 2014. Moving metals ii: Provenancing scandinavian bronze age artefacts by lead isotope and elemental analyses. Journal of Archaeological Science 41, 106–132. https://doi.org/https://doi.org/10.1016/j.jas.2013.07.018

Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K., 2019. Cluster: Cluster analysis basics and extensions.

Müller, K., 2017. Here: A simpler way to find your files.

Pedersen, T.L., 2019. Ggforce: Accelerating ’ggplot2’.

R Core Team, 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Venables, W.N., Ripley, B.D., 2002. Modern applied statistics with s, Fourth. ed. Springer, New York.

Vu, V.Q., 2011. Ggbiplot: A ggplot2 based biplot.

Wei, T., Simko, V., 2017. R package “corrplot”: Visualization of a correlation matrix.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T.L., Miller, E., Bache, S.M., Müller, K., Ooms, J., Robinson, D., Seidel, D.P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., Yutani, H., 2019. Welcome to the tidyverse. Journal of Open Source Software 4, 1686. https://doi.org/10.21105/joss.01686

Wilke, C.O., 2020. Ggridges: Ridgeline plots in ’ggplot2’.

Wilke, C.O., 2019. Cowplot: Streamlined plot theme and plot annotations for ’ggplot2’.

Wong, J., 2013. Pdist: Partitioned distance function.

Xie, Y., 2020a. Knitr: A general-purpose package for dynamic report generation in r.

Xie, Y., 2020b. Bookdown: Authoring books and technical documents with r markdown.

Xie, Y., Cheng, J., Tan, X., 2020. DT: A wrapper of the javascript library ’datatables’.